Geospatial/SAE workshop

Raster files

Josh Merfeld

October 2024

Rasters

  • We’ve discussed shapefiles
    • Now let’s talk about rasters!

  • Rasters are a different type of geospatial data
    • They are made up of a grid of cells
    • Each cell has a value

Example raster grid - how much info do we need?

  • Here’s a grid.
    • How many points do we need?

Example raster grid - how much info do we need?

  • Need to know location of one grid cell…
    • And the size of each grid!

How much info do we need?

  • In other words, we do not need a point for every raster cell

  • We just need to know:

    • The location of one cell
    • The size of each cell
      • This is called the resolution of the raster

  • Example:

    • I know the first grid cell in bottom left is at (0, 0)
    • I know each grid cell is 1 meter by 1 meter (the resolution)
    • Then I know the exact location of every single grid cell

Population in Cotonou, Benin

  • What are the white values?

Population in Cotonou, Benin

  • Here’s the information for this raster
    • What’s the resolution? What are the units?
class       : SpatRaster 
dimensions  : 34, 46, 1  (nrow, ncol, nlyr)
resolution  : 0.0008333333, 0.0008333333  (x, y)
extent      : 2.39375, 2.432083, 6.342917, 6.37125  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source      : beninpop.tif 
name        : beninpop 

Rasters

  • Rasters are defined by the grid layout and the resolution
    • Grid cells are sometimes called pixels (just like images, which are often rasters!)

  • There are many different file types for rasters
    • .tif or .tiff (one of the most common)
    • .nc (NetCDF, common for very large raster data)
    • Image files, e.g. png, jpg, etc.

Reading rasters in R

  • Reading rasters is also quite easy!
    • Going to use the terra package for it
      • Note: can use terra for shapefiles, too
    • rastersdata/beninpop.tif is a raster of population counts in Benin
Code
library(terra)

# this is the raster for Cotonou, Benin
cotonou <- rast("rastersdata/beninpop.tif")
cotonou
class       : SpatRaster 
dimensions  : 34, 46, 1  (nrow, ncol, nlyr)
resolution  : 0.0008333333, 0.0008333333  (x, y)
extent      : 2.39375, 2.432083, 6.342917, 6.37125  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source      : beninpop.tif 
name        : beninpop 

Plotting rasters

  • Plotting rasters only with terra is a bit of a pain
    • Can’t use ggplot
    • So, I load another package that lets me use ggplot with rasters
      • tidyterra
Code
library(tidyterra)

ggplot() +
  geom_spatraster(data = cotonou) + 
  theme_bw() +
  theme(plot.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb")) + 
  theme(legend.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb"))

Making it nicer

Code
library(tidyterra)

ggplot() +
  geom_spatraster(data = cotonou) + 
  # distiller is for continuous values
  # but we can use palettes!
  # I like spectral a lot
  scale_fill_distiller("Population\ncount", 
    palette = "Spectral", na.value = "white") +
  theme_bw() +
  labs(subtitle = "Population in Cotonou, Benin") + 
  theme(plot.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb")) + 
  theme(legend.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb"))

Extracting raster data to shapefiles

  • Let’s go back to our use case:
    • We want to estimate a sub-area model at the EA level in Malawi
    • This means we need to extract raster data to the EA level
    • We can do this with terra, sf, and exactextractr
      • terra has its own method, but i find exactextractr to be MUCH faster

  • Let’s start by looking at the raster I’ve uploaded to the rastersdata: mwpop.tif

Give it a try

  • Try to load it into R using terra, then plot it with tidyterra and ggplot
Code
tif <- rast("rastersdata/mwpop.tif")

ggplot() +
  geom_spatraster(data = tif) + 
  scale_fill_distiller("Population\ncount", 
    palette = "Spectral", na.value = "white") +
  theme_bw() +
  labs(subtitle = "Population in Northern Malawi") + 
  theme(plot.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb")) + 
  theme(legend.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb"))

Give it a try

  • I actually don’t like that map! It’s too hard to see because of all the low values.
  • So let’s take logs, instead!
    • Note that all the zeros become missing (can’t log zero)
Code
tif <- rast("rastersdata/mwpop.tif")

ggplot() +
  geom_spatraster(data = log(tif)) + 
  scale_fill_distiller("Population\ncount (log)", 
    palette = "Spectral", na.value = "white") +
  theme_bw() +
  labs(subtitle = "Population in Northern Malawi") + 
  theme(plot.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb")) + 
  theme(legend.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb"))

We want to extract the .tif values to the .shp

Let’s do it with exactextractr

Code
library(exactextractr)

tif <- rast("rastersdata/mwpop.tif")
adm4 <- read_sf("rastersdata/mw4.shp")
# make sure they are in the same CRS! (they already are, but just in case)
# st_transform is for the sf object
adm4 <- st_transform(adm4, crs = crs(tif))

# extract the raster values to the shapefile
# we are going to SUM, and add the EA_CODE from the shapefile to the result
extracted <- exact_extract(tif, adm4, fun = "sum", append_cols = "EA_CODE")
Code
head(extracted)
   EA_CODE       sum
1 10507801 1068.7787
2 10507072  695.8013
3 10507010  938.1139
4 10507001  749.6960
5 10507009  597.5432
6 10507033  474.3934

Now we can join the extracted data to the shapefile

Code
# join
adm4 <- adm4 |>
  left_join(extracted, by = "EA_CODE")

# plot it!
ggplot() +
  geom_sf(data = adm4, aes(fill = sum), 
    color = "black", lwd = 0.01) +
  scale_fill_distiller("Population\ncount", 
    palette = "Spectral", na.value = "white") +
  theme_bw() +
  labs(subtitle = "Population in EAs") + 
  theme(plot.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb")) + 
  theme(legend.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb"))

Now it’s your turn

  • Here’s your task:

Now it’s your turn

  • Here’s your task:
    • Search for “worldpop population counts”
    • Scroll down the page, click on “unconstrained individual countries 2000-2020 UN adjusted (1km resolution)
    • Then, search for a country (maybe yours?)

Now it’s your turn

  • Here’s your task:
    • Search for “worldpop population counts”
    • Scroll down the page, click on “unconstrained individual countries 2000-2020 UN adjusted (1km resolution)
    • Then, search for a country (maybe yours?)
    • Click on “Data & Resources” for 2020
    • Scroll down to the bottom of the page and download the .tif

Now it’s your turn

  • Load the .tif into R using terra
  • Plot the raster using tidyterra and ggplot
    • Make it look nice!

Let’s keep going!

  • Now you need to find a shapefile for the same country
  • This will be a bit less straightforward
    • Search for “shapefile COUNTRY humdata”
    • You should find a link to the Humanitarian Data Exchange
    • Click on it and see if it has shapefiles for your country of choice
    • If so, download a shapefile (it can be at a higher admin level)
      • If not, raise your hand and I’ll come help you find a shapefile
    • Load it into R and plot it!

One last thing

  • You have the population tif and the shapefile
  • Extract the population data (using sum, don’t forget!) to the shapefile
    • Use append_cols and make sure you choose the correct identifier!
  • Join the data to the shapefile
  • Plot the shapefile with the population data
    • Make it look nice!

What can you do with that data?

  • Now you have a shapefile with population data
  • You can save it as a .csv and use it in your analysis!
    • We’ll get to this point eventually.
    • We will also discuss adding the survey data and then estimating a sub-area model

Where can you find rasters?

  • Depending on the variable, rasters are sometimes quite easy to find!
    • We already saw one example: WorldPop (population counts)

  • There are two large online repositories:

Google Earth Engine

  • Google Earth Engine has a lot of data.

  • Let’s see some examples

The actual code is much easier to use than GEE!

Code
library(GeoLink)
library(sf)

adm4 <- read_sf("mw4.shp")
# CHIRPS is rainfall data
adm4_chirps <- geolink_chirps(time_unit = "month",
  start_date = "2019-01-01",
  end_date = "2019-03-01",
  shp_dt = adm4,
  extract_fun = "mean")
summary(adm4_chirps)
  • Keep an eye on this package! It will be very useful when it has more datasets

Google Earth Engine

  • First things first: you need an account!

  • Go to https://earthengine.google.com/ and sign up

    • Top-right corner: Get Started
    • Next page: Create account

  • I’ll give you all a couple minutes to get this set up. Let me know if you have problems

Let’s look at a dataset

  • On the https://earthengine.google.com/ page:
    • Click View all datasets (near the top)
    • Search for lights
    • We want VIIRS Nighttime Day/Night Annual Band Composites V2.1

Basic information about the dataset

Raster bands - They can have more than one!

We need to download python… sorry!

  • The first time you use rgeedim, you will need to install python
    • The easiest way is to search download miniconda
    • One of the first links should be https://docs.anaconda.com/miniconda/
    • Go down to “Latest Miniconda installer links” and select the correct link for your platform (e.g. Windows)
    • Once it finishes downloading, please go do your downloads folder and double click to install

  • Let’s take five minutes to download and install python

Downloading the data is… a pain

  • Actually getting the data is a bit of a pain
    • Unless you know Javascript!

  • A lot of people use libraries in R (or python) to download data, instead.
    • All of them are a bit cumbersome.
      • Especially true in R, because we need to use python!
    • We are going to use rgeedim
    • Go ahead and install it using install.packages("rgeedim")
    • Load it using library(rgeedim)
      • Type Yes when asked to create a default Python environment

The code

Code
library(rgeedim)
gd_install()
gd_authenticate(auth_mode = "notebook")
  • After gd_authenticate(), your browser should open.
    • You’ll need to sign in to your Google account
    • Continue through the prompts and make sure you select all access

The code

Code
library(rgeedim)
gd_install()
gd_authenticate(auth_mode = "notebook")
  • After gd_authenticate(), your browser should open.
    • You’ll need to sign in to your Google account
    • Continue through the prompts and make sure you select all access

The code

  • You’ll arrive at this page.
  • Click the Copy button
Code
library(rgeedim)
gd_install()
gd_authenticate(auth_mode = "notebook")
  • Then go back to RStudio, and paste (ctrl + v) the code into the console

The code

Code
library(rgeedim)
#gd_install() # You SHOULD NOT need to do this on each new session
gd_authenticate(auth_mode = "notebook") # need to do this
gd_initialize() # and you need to do this
  • After you do gd_install() once, you should be good
    • You will need to do gd_authenticate() and gd_initialize() each time you start a new session

Downloading the data - still not straightforward!

  • First, we need to create a “bounding box”
    • This is the area of the globe we want to search
    • We will use the Malawi shapefile for this
    • The “bounding box” is a rectangle that completely contains the shapefile
Code
# load shapefile
malawi <- read_sf("rastersdata/mw4.shp")
# this creates the bounding box
bbox <- st_bbox(malawi)
bbox
      xmin       ymin       xmax       ymax 
 32.942417 -12.740578  34.758878  -9.367346 

Basic information about the dataset

  • Remember I said we’d need this again?

Image collections vs. images

  • One key thing to understand about GEE is the difference between image collections and images

  • An image collection is what it sounds like: a collection of images

    • The key is that we won’t download image collections
    • We’ll download individual images
    • So we need to find the images!

Get images from the collection

Code
x <- gd_collection_from_name("NOAA/VIIRS/DNB/ANNUAL_V21") |>
  gd_search(region = bbox)
gd_properties(x)
                                  id                date
1 NOAA/VIIRS/DNB/ANNUAL_V21/20130101 2013-01-01 07:00:00
2 NOAA/VIIRS/DNB/ANNUAL_V21/20140101 2014-01-01 07:00:00
3 NOAA/VIIRS/DNB/ANNUAL_V21/20150101 2015-01-01 07:00:00
4 NOAA/VIIRS/DNB/ANNUAL_V21/20160101 2016-01-01 07:00:00
5 NOAA/VIIRS/DNB/ANNUAL_V21/20170101 2017-01-01 07:00:00
6 NOAA/VIIRS/DNB/ANNUAL_V21/20180101 2018-01-01 07:00:00
7 NOAA/VIIRS/DNB/ANNUAL_V21/20190101 2019-01-01 07:00:00
8 NOAA/VIIRS/DNB/ANNUAL_V21/20200101 2020-01-01 07:00:00
9 NOAA/VIIRS/DNB/ANNUAL_V21/20210101 2021-01-01 07:00:00
  • The survey data I have from Malawi is 2019/2020, so let’s download the 2019-01-01 data
    • We want to use the id: NOAA/VIIRS/DNB/ANNUAL_V21/20190101

We can FINALLY download the raster!

Code
x <- gd_image_from_id("NOAA/VIIRS/DNB/ANNUAL_V21/20190101") |>
  gd_download(
    filename = "temp.tif",
    region = bbox, # region is our bbox
    scale = 500, # resolution of raster is only 500, so no reason to go lower
    crs = 'EPSG:4326', # lat/lon
    overwrite = TRUE, # overwrite if it exists
    silent = FALSE
  )
# we downloaded the raster and called it x
# so let's load it using terra!
x <- rast(x)
# here it is!
x
class       : SpatRaster 
dimensions  : 752, 405, 9  (nrow, ncol, nlyr)
resolution  : 0.004491576, 0.004491576  (x, y)
extent      : 32.94122, 34.76031, -12.7426, -9.364937  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source      : temp.tif 
names       : average, avera~asked, cf_cvg, cvg, maximum, median, ... 

Quick note: we downloaded many bands!

Code
names(x)
[1] "average"        "average_masked" "cf_cvg"         "cvg"           
[5] "maximum"        "median"         "median_masked"  "minimum"       
[9] "FILL_MASK"     
Code
# but we really only want average nightlights
# so here's how you can download just the average
x <- gd_image_from_id("NOAA/VIIRS/DNB/ANNUAL_V21/20190101") |>
  gd_download(
    filename = "temp.tif",
    region = bbox, # region is our bbox
    scale = 500, # resolution of raster is only 500, so no reason to go lower
    crs = 'EPSG:4326', # lat/lon
    overwrite = TRUE, # overwrite if it exists
    silent = FALSE,
    bands = list("average"),
  )
# we downloaded the raster and called it x
# so let's load it using terra!
x <- rast(x)
# here it is!
x
class       : SpatRaster 
dimensions  : 752, 405, 1  (nrow, ncol, nlyr)
resolution  : 0.004491576, 0.004491576  (x, y)
extent      : 32.94122, 34.76031, -12.7426, -9.364937  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source      : temp.tif 
name        : average 

What does it look like?

Code
adm4 <- read_sf("rastersdata/mw4.shp")
ggplot() +
  geom_spatraster(data = x) +
  scale_fill_distiller("Nightlights", 
    palette = "Spectral") +
  geom_sf(data = adm4, 
    color = "white",
    lwd = 0.01,
    alpha = 0.5,
    fill = "transparent") +
  theme_bw() +
  labs(subtitle = "Nightlights in Malawi") + 
  theme(plot.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb")) + 
  theme(legend.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb"))
  • Note what the “bounding box” does!
    • st_bbox as a reminder

We want to extract NTL to the shapefile

Code
adm4 <- read_sf("rastersdata/mw4.shp")
# exact_extract
library(exactextractr)
ntlextract <- exact_extract(x, adm4, fun = "mean", append_cols = "EA_CODE")
head(ntlextract)
# save it!
write_csv(ntlextract |> rename(ntl = mean), "rastersdata/mwntlEAs.csv")

Now it’s your turn!

  • We want to download NDVI data for Malawi
    • We want the Terra Vegetation Indices 16-Day Global 500m data (you can just search this)
    • Download the first observation from 2019
    • Extract it to the mw4 shapefile!
Code
# load shapefile
malawi <- read_sf("rastersdata/mw4.shp")
# this creates the bounding box
bbox <- st_bbox(malawi)

x <- gd_collection_from_name("MODIS/061/MOD13A1") |>
  gd_search(region = bbox)
gd_properties(x)
# the ID for the first image of 2019 is MODIS/061/MOD13A1/2019_01_01

Let’s download that id

Code
x <- gd_image_from_id("MODIS/061/MOD13A1/2019_01_01") |>
  gd_download(
    filename = "temp.tif",
    region = bbox, # region is our bbox
    scale = 500, # resolution
    crs = 'EPSG:4326', # lat/lon
    overwrite = TRUE, # overwrite if it exists
    bands = list("NDVI") # only download NDVI
  )
x <- rast(x) # load raster
ndviextract <- exact_extract(x, malawi, fun = "mean", append_cols = "EA_CODE")
malawi <- malawi |>
  left_join(ndviextract |> rename(ndvi = mean), by = "EA_CODE")
ggplot() +
  geom_sf(data = malawi, aes(fill = ndvi), lwd = 0.01,) +
  scale_fill_distiller("NDVI", 
    palette = "Greens", direction = 1) +
  theme_bw()

This returns a LOT of results - search by date!

Code
# load shapefile
malawi <- read_sf("rastersdata/mw4.shp")
# this creates the bounding box
bbox <- st_bbox(malawi)

# so let's filter by date!
x <- gd_collection_from_name("MODIS/061/MOD13A1") |>
  gd_search(region = bbox,
    start_date = "2019-01-01",
    end_date = "2019-12-31")
gd_properties(x)
                             id                date
1  MODIS/061/MOD13A1/2019_01_01 2019-01-01 07:00:00
2  MODIS/061/MOD13A1/2019_01_17 2019-01-17 07:00:00
3  MODIS/061/MOD13A1/2019_02_02 2019-02-02 07:00:00
4  MODIS/061/MOD13A1/2019_02_18 2019-02-18 07:00:00
5  MODIS/061/MOD13A1/2019_03_06 2019-03-06 07:00:00
6  MODIS/061/MOD13A1/2019_03_22 2019-03-22 07:00:00
7  MODIS/061/MOD13A1/2019_04_07 2019-04-07 07:00:00
8  MODIS/061/MOD13A1/2019_04_23 2019-04-23 07:00:00
9  MODIS/061/MOD13A1/2019_05_09 2019-05-09 07:00:00
10 MODIS/061/MOD13A1/2019_05_25 2019-05-25 07:00:00
11 MODIS/061/MOD13A1/2019_06_10 2019-06-10 07:00:00
12 MODIS/061/MOD13A1/2019_06_26 2019-06-26 07:00:00
13 MODIS/061/MOD13A1/2019_07_12 2019-07-12 07:00:00
14 MODIS/061/MOD13A1/2019_07_28 2019-07-28 07:00:00
15 MODIS/061/MOD13A1/2019_08_13 2019-08-13 07:00:00
16 MODIS/061/MOD13A1/2019_08_29 2019-08-29 07:00:00
17 MODIS/061/MOD13A1/2019_09_14 2019-09-14 07:00:00
18 MODIS/061/MOD13A1/2019_09_30 2019-09-30 07:00:00
19 MODIS/061/MOD13A1/2019_10_16 2019-10-16 07:00:00
20 MODIS/061/MOD13A1/2019_11_01 2019-11-01 07:00:00
21 MODIS/061/MOD13A1/2019_11_17 2019-11-17 07:00:00
22 MODIS/061/MOD13A1/2019_12_03 2019-12-03 07:00:00
23 MODIS/061/MOD13A1/2019_12_19 2019-12-19 07:00:00

Let’s look at the dates

Code
gd_properties(x)$date
 [1] "2019-01-01 07:00:00 +07" "2019-01-17 07:00:00 +07"
 [3] "2019-02-02 07:00:00 +07" "2019-02-18 07:00:00 +07"
 [5] "2019-03-06 07:00:00 +07" "2019-03-22 07:00:00 +07"
 [7] "2019-04-07 07:00:00 +07" "2019-04-23 07:00:00 +07"
 [9] "2019-05-09 07:00:00 +07" "2019-05-25 07:00:00 +07"
[11] "2019-06-10 07:00:00 +07" "2019-06-26 07:00:00 +07"
[13] "2019-07-12 07:00:00 +07" "2019-07-28 07:00:00 +07"
[15] "2019-08-13 07:00:00 +07" "2019-08-29 07:00:00 +07"
[17] "2019-09-14 07:00:00 +07" "2019-09-30 07:00:00 +07"
[19] "2019-10-16 07:00:00 +07" "2019-11-01 07:00:00 +07"
[21] "2019-11-17 07:00:00 +07" "2019-12-03 07:00:00 +07"
[23] "2019-12-19 07:00:00 +07"
  • What should we download?
    • NDVI is a vegetation index, which means it varies quite a bit throughout the year
    • We could take average NDVI throughout the year
    • Or we could take NDVI at a specific time of year
    • Or we could take the max… or the min… or all of the above!
  • Let’s download one raster PER MONTH
    • How?

Let’s look at the dates

  • There are 23 dates
  • This code is a bit complicated, so let me explain
Code
# get the dates
dates <- gd_properties(x)$date
# here are the months
months <- month(dates)
ids <- c() # this creates an empty vector
for (m in 1:12){ # this "for loop" loops through each month (1 through 12)
  ids <- c(ids, which(months == m)[1]) # it then takes the LOCATION of the FIRST VALUE equal to m
}
ids <- gd_properties(x)$id[ids] # this gets the image ids at those locations
ids # Now we have all the ids we want to download!
 [1] "MODIS/061/MOD13A1/2019_01_01" "MODIS/061/MOD13A1/2019_02_02"
 [3] "MODIS/061/MOD13A1/2019_03_06" "MODIS/061/MOD13A1/2019_04_07"
 [5] "MODIS/061/MOD13A1/2019_05_09" "MODIS/061/MOD13A1/2019_06_10"
 [7] "MODIS/061/MOD13A1/2019_07_12" "MODIS/061/MOD13A1/2019_08_13"
 [9] "MODIS/061/MOD13A1/2019_09_14" "MODIS/061/MOD13A1/2019_10_16"
[11] "MODIS/061/MOD13A1/2019_11_01" "MODIS/061/MOD13A1/2019_12_03"

Here’s how I would do this

Code
adminareas <- malawi |>
  as_tibble() |>
  select(EA_CODE)

for (i in 1:length(ids)){
  x <- gd_image_from_id(ids[i]) |>
  gd_download(
    filename = "temp.tif",
    region = bbox, # region is our bbox
    scale = 500, # resolution
    crs = 'EPSG:4326', # lat/lon
    overwrite = TRUE, # overwrite if it exists
    bands = list("NDVI") # only download NDVI
  )
  x <- rast(x) # load raster
  ndviextract <- exact_extract(x, malawi, fun = "mean", append_cols = "EA_CODE")
  colnames(ndviextract) <- c("EA_CODE", paste0("NDVI_", i))
  adminareas <- adminareas |>
    left_join(ndviextract, by = "EA_CODE")
}

But that’s difficult, so I’ve uploaded the data!

  • That’s a hard thing to wrap your head around if you’re new to R
    • If you want to download them, you can just do them one at a
    • I’ve uploaded the data:
      • rastersdata/ndviallmonths.csv
      • Go ahead and load it and take a look at it
Code
ndviall <- read_csv("rastersdata/ndviallmonths.csv")
ndviall
# A tibble: 3,212 × 13
    EA_CODE NDVI_1 NDVI_2 NDVI_3 NDVI_4 NDVI_5 NDVI_6 NDVI_7 NDVI_8 NDVI_9
      <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
 1 10507801  3950.  6635.  6629.  6212.  5028.  3953.  3443.  2956.  2611.
 2 10507072  4812.  7246.  7090.  6210.  5709.  4523.  3966.  3279.  2714.
 3 10507010  4328.  6140.  6316.  6134.  4481.  3909.  3520.  3067.  2735.
 4 10507001  5716.  7187.  7032.  6976.  6409.  5312.  4692.  3989.  3673.
 5 10507009  4952.  6510.  6674.  6728.  5407.  4653.  4141.  3532.  3107.
 6 10507033  4291.  6216.  6299.  6382.  4868.  4280.  3812.  3443.  2856.
 7 10507032  5049.  6668.  6558.  6747.  5073.  4534.  3871.  3384.  2927.
 8 10507002  4545.  6682.  6510   6589.  5348.  4460.  3932.  3313.  2946.
 9 10507003  4678.  6269.  6066.  5943.  5104.  4139.  3696.  3211.  2940.
10 10507008  4476.  6459.  6445.  6530.  4930.  4276.  3827.  3200.  2892.
# ℹ 3,202 more rows
# ℹ 3 more variables: NDVI_10 <dbl>, NDVI_11 <dbl>, NDVI_12 <dbl>

Let’s create some new variables

  • Let’s create three new NDVI variables:
    • Annual minimum
    • Annual maximum
    • Annual average
  • New R function: apply!
Code
# create new variable called min. the "1" means across ROWS.
ndviall$ndvimin <- apply(ndviall |> select(starts_with("NDVI")), 1, min, na.rm = TRUE)
# max
ndviall$ndvimax <- apply(ndviall |> select(starts_with("NDVI")), 1, max, na.rm = TRUE)
# mean
ndviall$ndvimean <- apply(ndviall |> select(starts_with("NDVI")), 1, mean, na.rm = TRUE)
# just keep those
ndviall <- ndviall |>
  select(EA_CODE, ndvimin, ndvimax, ndvimean)
# save it!
write_csv(ndviall, "rastersdata/ndviclean.csv")

One last step

  • We need to get the codes for the survey data!
  • I’ve uploaded the survey data for Malawi
    • rastersdata/ihs5_consumption_aggregate.dta
    • Go ahead and load it (remember to use haven)
Code
library(haven)
ihs5 <- read_dta("rastersdata/ihs5_consumption_aggregate.dta")
head(ihs5)
# A tibble: 6 × 7
  case_id      ea_id       TA adulteq hh_wgt rexpaggpc poor        
  <chr>        <chr>    <dbl>   <dbl>  <dbl>     <dbl> <dbl+lbl>   
1 101011000014 10101100 10101    3.47   93.7    99918. 1 [Poor]    
2 101011000023 10101100 10101    3.82   93.7   169323. 0 [Non-poor]
3 101011000040 10101100 10101    2.67   93.7    93644. 1 [Poor]    
4 101011000071 10101100 10101    4.79   93.7   452758. 0 [Non-poor]
5 101011000095 10101100 10101    4.31   93.7   183333. 0 [Non-poor]
6 101011000115 10101100 10101    2      93.7   420656. 0 [Non-poor]

One last step

Code
head(ihs5)
# A tibble: 6 × 7
  case_id      ea_id       TA adulteq hh_wgt rexpaggpc poor        
  <chr>        <chr>    <dbl>   <dbl>  <dbl>     <dbl> <dbl+lbl>   
1 101011000014 10101100 10101    3.47   93.7    99918. 1 [Poor]    
2 101011000023 10101100 10101    3.82   93.7   169323. 0 [Non-poor]
3 101011000040 10101100 10101    2.67   93.7    93644. 1 [Poor]    
4 101011000071 10101100 10101    4.79   93.7   452758. 0 [Non-poor]
5 101011000095 10101100 10101    4.31   93.7   183333. 0 [Non-poor]
6 101011000115 10101100 10101    2      93.7   420656. 0 [Non-poor]
  • Note that in this case, we already have the ea codes in the survey data
    • So we can just join the two datasets!

Collapse to EA_CODE

Code
library(stats) # this is for weighted.mean
ihs5ea <- ihs5 |>
  rename(EA_CODE = ea_id) |>
  group_by(EA_CODE) |>
  # Note that this is a weighted mean!
  summarize(poor = stats::weighted.mean(x = poor, w = hh_wgt*adulteq, na.rm = TRUE), # weighted mean
    total_weights = sum(hh_wgt*adulteq, na.rm = TRUE), # sum total weights
    total_obs = n()) # total observations (households) in the EA
head(ihs5ea)
# A tibble: 6 × 4
  EA_CODE    poor total_weights total_obs
  <chr>     <dbl>         <dbl>     <int>
1 10101100 0.230          5690.        16
2 10101101 0.444          7614.        16
3 10101102 0.0947         9441.        16
4 10101103 0.376          7486.        16
5 10101104 0.600          9147.        16
6 10101105 0.497          5351.        16

But what if you don’t have an identifier?

  • Sometimes you have GPS coordinates but not matching identifiers
    • Good news! We can use geospatial tools to match the coordinates to the shapefile
Code
ihs5coords <- read_dta("rastersdata/householdgeovariables_ihs5.dta")
# turn it into an sf object
ihs5coords <- ihs5coords |>
  filter(!is.na(ea_lon_mod)) |> # get rid of any missing values (can't use them)
  st_as_sf(coords = c("ea_lon_mod", "ea_lat_mod"), crs = 4326)
head(ihs5coords)
Simple feature collection with 6 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 33.23917 ymin: -9.70057 xmax: 33.23917 ymax: -9.70057
Geodetic CRS:  WGS 84
# A tibble: 6 × 2
  case_id                 geometry
  <chr>                <POINT [°]>
1 101011000014 (33.23917 -9.70057)
2 101011000023 (33.23917 -9.70057)
3 101011000040 (33.23917 -9.70057)
4 101011000071 (33.23917 -9.70057)
5 101011000095 (33.23917 -9.70057)
6 101011000115 (33.23917 -9.70057)

Here’s how it looks

Code
ihs5coords <- read_dta("rastersdata/householdgeovariables_ihs5.dta")
# turn it into an sf object
ihs5coords <- ihs5coords |>
  filter(!is.na(ea_lon_mod)) |> # get rid of any missing values (can't use them)
  st_as_sf(coords = c("ea_lon_mod", "ea_lat_mod"), crs = 4326)
admin4 <- read_sf("rastersdata/mw4.shp")
ggplot() + 
  geom_sf(data = admin4, color = "transparent", lwd = 0.01) +
  geom_sf(data = ihs5coords, color = "red") +
  theme_bw()

We can join them using st_join

Code
ihs5coords <- read_dta("rastersdata/householdgeovariables_ihs5.dta")
# turn it into an sf object
ihs5coords <- ihs5coords |>
  filter(!is.na(ea_lon_mod)) |> # get rid of any missing values (can't use them)
  st_as_sf(coords = c("ea_lon_mod", "ea_lat_mod"), crs = 4326)
admin4 <- read_sf("rastersdata/mw4.shp")
ihs5coords <- st_join(ihs5coords, admin4)
# and that's it!
head(ihs5coords)
Simple feature collection with 6 features and 4 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 33.23917 ymin: -9.70057 xmax: 33.23917 ymax: -9.70057
Geodetic CRS:  WGS 84
# A tibble: 6 × 5
  case_id                 geometry DIST_CODE EA_CODE  TA_CODE
  <chr>                <POINT [°]> <chr>     <chr>    <chr>  
1 101011000014 (33.23917 -9.70057) 101       10101006 10101  
2 101011000023 (33.23917 -9.70057) 101       10101006 10101  
3 101011000040 (33.23917 -9.70057) 101       10101006 10101  
4 101011000071 (33.23917 -9.70057) 101       10101006 10101  
5 101011000095 (33.23917 -9.70057) 101       10101006 10101  
6 101011000115 (33.23917 -9.70057) 101       10101006 10101  

Now we need to add the EA_CODE to the poverty data

Code
ihs5 <- read_dta("rastersdata/ihs5_consumption_aggregate.dta")
ihs5 <- ihs5 |>
  left_join(ihs5coords, by = "case_id")
# collapse to EA_CODE
ihs5ea <- ihs5 |>
  group_by(EA_CODE) |>
  # Note that this is a weighted mean!
  summarize(poor = stats::weighted.mean(x = poor, w = hh_wgt*adulteq, na.rm = TRUE), # weighted mean
    total_weights = sum(hh_wgt*adulteq, na.rm = TRUE), # sum total weights
    total_obs = n()) # total observations (households) in the EA
head(ihs5ea)
# A tibble: 6 × 4
  EA_CODE    poor total_weights total_obs
  <chr>     <dbl>         <dbl>     <int>
1 10101006 0.230          5690.        16
2 10101011 0.444          7614.        16
3 10101027 0.0947         9441.        16
4 10101033 0.376          7486.        16
5 10101039 0.600          9147.        16
6 10101054 0.497          5351.        16
Code
# save it!
write_csv(ihs5ea, "rastersdata/ihs5ea.csv")

One last set of data: MOSAIKS

One last set of data: MOSAIKS

  • MOSAIKS is a dataset that has a lot of different variables
    • These variables were constructed by the authors from satellite imagery
    • You can download all of these features using their website
    • https://api.mosaiks.org - you’ll have to register
    • The data is quite large, so I’ve downloaded and uploaded it for you
      • It’s a random selection of 500 variables for EAs in Norther Malawi
      • rastersdata/mosaikvars.csv
      • Note that I used st_join to match it to the shapefile!

Rolf et al. (2021)

Putting it all together

What have we downloaded?

  • We have:
    • Population data from WorldPop
    • Nightlights data from GEE
    • NDVI data from GEE
    • MOSAIKS data
    • Survey data from Malawi

  • We have everything we need to estimate a simple SAE model using geospatial data!
    • Note: in practice, I would use even more predictors, but this is a good start

First, let’s load all the data

Code
# Population 
pop <- read_csv("rastersdata/mwpopEAs.csv")
# Nightlights
ntl <- read_csv("rastersdata/mwntlEAs.csv")
# NDVI
ndvi <- read_csv("rastersdata/ndviclean.csv")
# mosaiks
mosaik <- read_csv("rastersdata/mosaikvars.csv")
# survey data
ihs5ea <- read_csv("rastersdata/ihs5ea.csv")
# all EAs
adm4 <- read_sf("rastersdata/mw4.shp")
# no geometry, just EA_CODE
adm4 <- as_tibble(adm4) |>
  dplyr::select(EA_CODE, TA_CODE)

Now, we join it all!

Code
adm4 <- adm4 |>
  left_join(ihs5ea, by = "EA_CODE") |> # add survey data
  left_join(pop, by = "EA_CODE") |> # add pop
  left_join(ntl, by = "EA_CODE") |> # add nightlights
  left_join(ndvi, by = "EA_CODE") |> # add ndvi
  left_join(mosaik, by = "EA_CODE") |> # add mosaik
  
head(adm4)
  • Oh no! This throws an error!
    • x$EA_CODE is a <character>
    • y$EA_CODE is a <double>
    • What’s going on?

Now, we join it all!

Code
adm4 <- adm4 |>
  mutate(EA_CODE = as.numeric(EA_CODE)) |>
  left_join(ihs5ea |> mutate(EA_CODE = as.numeric(EA_CODE)), by = "EA_CODE") |> # add survey data
  left_join(pop |> mutate(EA_CODE = as.numeric(EA_CODE)), by = "EA_CODE") |> # add pop
  left_join(ntl |> mutate(EA_CODE = as.numeric(EA_CODE)), by = "EA_CODE") |> # add nightlights
  left_join(ndvi |> mutate(EA_CODE = as.numeric(EA_CODE)), by = "EA_CODE") |> # add ndvi
  left_join(mosaik |> mutate(EA_CODE = as.numeric(EA_CODE)), by = "EA_CODE") # add mosaik
  
head(adm4)
# A tibble: 6 × 510
   EA_CODE TA_CODE  poor total_weights total_obs   pop   ntl ndvimin ndvimax
     <dbl> <chr>   <dbl>         <dbl>     <dbl> <dbl> <dbl>   <dbl>   <dbl>
1 10507801 10507      NA            NA        NA 1069. 0.205   2546.   6635.
2 10507072 10507      NA            NA        NA  696. 0.129   2409.   7246.
3 10507010 10507      NA            NA        NA  938. 0.135   2579.   6316.
4 10507001 10507      NA            NA        NA  750. 0.129   3200.   7187.
5 10507009 10507      NA            NA        NA  598. 0.146   3107.   6728.
6 10507033 10507      NA            NA        NA  474. 0.139   2856.   6382.
# ℹ 501 more variables: ndvimean <dbl>, mosaik1 <dbl>, mosaik2 <dbl>,
#   mosaik3 <dbl>, mosaik4 <dbl>, mosaik5 <dbl>, mosaik6 <dbl>, mosaik7 <dbl>,
#   mosaik8 <dbl>, mosaik9 <dbl>, mosaik10 <dbl>, mosaik11 <dbl>,
#   mosaik12 <dbl>, mosaik13 <dbl>, mosaik14 <dbl>, mosaik15 <dbl>,
#   mosaik16 <dbl>, mosaik17 <dbl>, mosaik18 <dbl>, mosaik19 <dbl>,
#   mosaik20 <dbl>, mosaik21 <dbl>, mosaik22 <dbl>, mosaik23 <dbl>,
#   mosaik24 <dbl>, mosaik25 <dbl>, mosaik26 <dbl>, mosaik27 <dbl>, …

What do poverty rates look like?

Code
# levels vs. arcsin squareroot transformation
ggplot() +
  geom_density(data = adm4, aes(x = poor), fill = "navy", alpha = 0.5) +
  geom_density(data = adm4, aes(x = asin(sqrt(poor))), fill = "orange", alpha = 0.5) +
  theme_bw()
Code
# Not perfect but neither is horribly non-normal!

Before SAE, need to do some cleaning

Code
# check for missings
sum(is.na(adm4$pop))
[1] 0
Code
sum(is.na(adm4$ntl))
[1] 0
Code
sum(is.na(adm4$ndvimin))
[1] 0
Code
sum(is.na(adm4$mosaik1))
[1] 301
Code
# we have missings for the mosaik features!

Missing mosaiks, what to do?

Code
# I am going to replace missing mosaiks values with the TA mean
adm4 <- adm4 |>
  group_by(TA_CODE) |>
  mutate(across(starts_with("mosaik"), ~replace_na(., mean(., na.rm = TRUE)))) |>
  ungroup()

sum(is.na(adm4$mosaik1)) # fixed!
[1] 0

Now let’s select features

Code
library(glmnet)

# let's also log population!
adm4 <- adm4 |>
  mutate(pop = log(pop))

surveyeas <- adm4 |>
  filter(!is.na(poor))
samplefeatures <- surveyeas |>
  select(-c(EA_CODE, TA_CODE, poor, total_weights, total_obs)) # remove these variables
samplelabels <- surveyeas$poor
sampleweights <- surveyeas$total_weights

# now lasso
set.seed(24826)
lasso <- cv.glmnet(x = as.matrix(samplefeatures), y = samplelabels, 
  weights = sampleweights,
  alpha = 1)

What did lasso do?

Code
coef(lasso, s = "lambda.min")
506 x 1 sparse Matrix of class "dgCMatrix"
                      s1
(Intercept)   0.44719559
pop          -0.02712011
ntl          -0.04331957
ndvimin       .         
ndvimax       .         
ndvimean      .         
mosaik1       .         
mosaik2       .         
mosaik3       .         
mosaik4       .         
mosaik5       .         
mosaik6       .         
mosaik7       .         
mosaik8       .         
mosaik9       .         
mosaik10      .         
mosaik11      .         
mosaik12      .         
mosaik13      .         
mosaik14      .         
mosaik15      .         
mosaik16      .         
mosaik17      .         
mosaik18      .         
mosaik19      .         
mosaik20      .         
mosaik21      .         
mosaik22      .         
mosaik23      .         
mosaik24      .         
mosaik25      .         
mosaik26      .         
mosaik27      .         
mosaik28      .         
mosaik29      .         
mosaik30      .         
mosaik31      .         
mosaik32      .         
mosaik33      .         
mosaik34      .         
mosaik35      .         
mosaik36      .         
mosaik37      .         
mosaik38      .         
mosaik39      .         
mosaik40      .         
mosaik41      .         
mosaik42      .         
mosaik43      .         
mosaik44      .         
mosaik45      .         
mosaik46      .         
mosaik47      .         
mosaik48      .         
mosaik49      .         
mosaik50      .         
mosaik51      .         
mosaik52      .         
mosaik53      .         
mosaik54      .         
mosaik55      .         
mosaik56      .         
mosaik57      .         
mosaik58      .         
mosaik59      .         
mosaik60      .         
mosaik61      .         
mosaik62      .         
mosaik63      .         
mosaik64      .         
mosaik65      .         
mosaik66      .         
mosaik67      .         
mosaik68      .         
mosaik69      .         
mosaik70      .         
mosaik71      .         
mosaik72      .         
mosaik73      .         
mosaik74      .         
mosaik75      .         
mosaik76      .         
mosaik77      .         
mosaik78      .         
mosaik79      .         
mosaik80      .         
mosaik81      .         
mosaik82      .         
mosaik83      .         
mosaik84      .         
mosaik85      .         
mosaik86      .         
mosaik87      .         
mosaik88      .         
mosaik89      .         
mosaik90      .         
mosaik91      .         
mosaik92      .         
mosaik93      .         
mosaik94      .         
mosaik95      .         
mosaik96      .         
mosaik97      .         
mosaik98      .         
mosaik99      .         
mosaik100     .         
mosaik101     .         
mosaik102     .         
mosaik103     .         
mosaik104     .         
mosaik105     .         
mosaik106     .         
mosaik107     .         
mosaik108     .         
mosaik109     .         
mosaik110     .         
mosaik111     .         
mosaik112     .         
mosaik113     .         
mosaik114     .         
mosaik115     .         
mosaik116     .         
mosaik117     .         
mosaik118     .         
mosaik119     .         
mosaik120     .         
mosaik121     .         
mosaik122     .         
mosaik123     .         
mosaik124     .         
mosaik125     .         
mosaik126     .         
mosaik127     .         
mosaik128     .         
mosaik129     .         
mosaik130     .         
mosaik131     .         
mosaik132     .         
mosaik133     .         
mosaik134     .         
mosaik135     .         
mosaik136     .         
mosaik137     .         
mosaik138     .         
mosaik139     .         
mosaik140     .         
mosaik141     .         
mosaik142     .         
mosaik143     .         
mosaik144     .         
mosaik145     .         
mosaik146     .         
mosaik147     .         
mosaik148     .         
mosaik149     .         
mosaik150     .         
mosaik151     .         
mosaik152     .         
mosaik153     .         
mosaik154     .         
mosaik155     .         
mosaik156     .         
mosaik157     .         
mosaik158     .         
mosaik159     .         
mosaik160     .         
mosaik161     .         
mosaik162     .         
mosaik163     .         
mosaik164     .         
mosaik165     .         
mosaik166     .         
mosaik167     .         
mosaik168     .         
mosaik169     .         
mosaik170     .         
mosaik171     .         
mosaik172     .         
mosaik173     .         
mosaik174     .         
mosaik175     .         
mosaik176     .         
mosaik177     .         
mosaik178     .         
mosaik179     .         
mosaik180     .         
mosaik181     .         
mosaik182     .         
mosaik183     .         
mosaik184     .         
mosaik185     .         
mosaik186     .         
mosaik187     .         
mosaik188     .         
mosaik189     .         
mosaik190     .         
mosaik191     .         
mosaik192     .         
mosaik193     .         
mosaik194     .         
mosaik195     .         
mosaik196     .         
mosaik197     .         
mosaik198     .         
mosaik199     .         
mosaik200     .         
mosaik201     .         
mosaik202     .         
mosaik203     .         
mosaik204     .         
mosaik205     .         
mosaik206     .         
mosaik207     .         
mosaik208     .         
mosaik209     .         
mosaik210     .         
mosaik211     .         
mosaik212     .         
mosaik213     .         
mosaik214     .         
mosaik215     .         
mosaik216     .         
mosaik217     .         
mosaik218     .         
mosaik219     .         
mosaik220     .         
mosaik221     .         
mosaik222     .         
mosaik223     .         
mosaik224     .         
mosaik225     .         
mosaik226     .         
mosaik227     .         
mosaik228     .         
mosaik229     .         
mosaik230     .         
mosaik231     .         
mosaik232     .         
mosaik233     .         
mosaik234     .         
mosaik235     .         
mosaik236     .         
mosaik237     .         
mosaik238     .         
mosaik239     .         
mosaik240     .         
mosaik241     .         
mosaik242     .         
mosaik243     .         
mosaik244     .         
mosaik245     .         
mosaik246     .         
mosaik247     .         
mosaik248     .         
mosaik249     .         
mosaik250     .         
mosaik251     .         
mosaik252     .         
mosaik253     .         
mosaik254     .         
mosaik255     .         
mosaik256     .         
mosaik257     .         
mosaik258     .         
mosaik259     .         
mosaik260     .         
mosaik261     .         
mosaik262     .         
mosaik263     0.07503013
mosaik264     .         
mosaik265     .         
mosaik266     .         
mosaik267     .         
mosaik268     .         
mosaik269     .         
mosaik270     .         
mosaik271     .         
mosaik272     .         
mosaik273     .         
mosaik274     .         
mosaik275     .         
mosaik276     .         
mosaik277     0.02587007
mosaik278     .         
mosaik279     .         
mosaik280   -11.79217754
mosaik281     .         
mosaik282     .         
mosaik283     .         
mosaik284     .         
mosaik285     .         
mosaik286     .         
mosaik287     .         
mosaik288     .         
mosaik289     .         
mosaik290     .         
mosaik291     .         
mosaik292     .         
mosaik293    -0.03074689
mosaik294     .         
mosaik295     .         
mosaik296     .         
mosaik297     .         
mosaik298     .         
mosaik299     .         
mosaik300     .         
mosaik301     .         
mosaik302     .         
mosaik303     .         
mosaik304     .         
mosaik305     .         
mosaik306     .         
mosaik307     .         
mosaik308     .         
mosaik309     .         
mosaik310     .         
mosaik311     .         
mosaik312     .         
mosaik313     .         
mosaik314     .         
mosaik315     .         
mosaik316     .         
mosaik317     .         
mosaik318     .         
mosaik319     .         
mosaik320     .         
mosaik321     .         
mosaik322     .         
mosaik323     .         
mosaik324     .         
mosaik325     .         
mosaik326     .         
mosaik327     .         
mosaik328     .         
mosaik329     .         
mosaik330     .         
mosaik331     .         
mosaik332     .         
mosaik333     .         
mosaik334     .         
mosaik335     .         
mosaik336     .         
mosaik337     .         
mosaik338     .         
mosaik339     .         
mosaik340     .         
mosaik341     .         
mosaik342     .         
mosaik343     .         
mosaik344     .         
mosaik345     .         
mosaik346     .         
mosaik347     .         
mosaik348     .         
mosaik349     .         
mosaik350     .         
mosaik351     .         
mosaik352     .         
mosaik353     .         
mosaik354     .         
mosaik355     .         
mosaik356     .         
mosaik357     .         
mosaik358     .         
mosaik359     .         
mosaik360     .         
mosaik361     .         
mosaik362     .         
mosaik363     .         
mosaik364     .         
mosaik365     .         
mosaik366     .         
mosaik367     .         
mosaik368     .         
mosaik369     .         
mosaik370     .         
mosaik371     .         
mosaik372     .         
mosaik373     .         
mosaik374     .         
mosaik375     .         
mosaik376     .         
mosaik377     .         
mosaik378     .         
mosaik379     .         
mosaik380     .         
mosaik381     .         
mosaik382     .         
mosaik383     .         
mosaik384     .         
mosaik385     .         
mosaik386     .         
mosaik387     .         
mosaik388     .         
mosaik389     .         
mosaik390     .         
mosaik391     .         
mosaik392     .         
mosaik393     .         
mosaik394     .         
mosaik395     .         
mosaik396     .         
mosaik397     .         
mosaik398     .         
mosaik399     .         
mosaik400     .         
mosaik401     .         
mosaik402     .         
mosaik403     .         
mosaik404     .         
mosaik405     .         
mosaik406     .         
mosaik407     .         
mosaik408     .         
mosaik409     .         
mosaik410     .         
mosaik411     .         
mosaik412     .         
mosaik413     .         
mosaik414     .         
mosaik415     .         
mosaik416     .         
mosaik417     .         
mosaik418     .         
mosaik419     .         
mosaik420     .         
mosaik421     .         
mosaik422     .         
mosaik423     .         
mosaik424     .         
mosaik425     .         
mosaik426     .         
mosaik427     .         
mosaik428     .         
mosaik429     .         
mosaik430     .         
mosaik431     .         
mosaik432     .         
mosaik433     .         
mosaik434     .         
mosaik435     .         
mosaik436     .         
mosaik437     .         
mosaik438     .         
mosaik439     .         
mosaik440     .         
mosaik441     .         
mosaik442     .         
mosaik443     .         
mosaik444     .         
mosaik445     .         
mosaik446     .         
mosaik447     .         
mosaik448     .         
mosaik449     .         
mosaik450     .         
mosaik451     .         
mosaik452     .         
mosaik453     .         
mosaik454     .         
mosaik455     .         
mosaik456     .         
mosaik457     .         
mosaik458     .         
mosaik459    36.72195657
mosaik460     .         
mosaik461     .         
mosaik462     .         
mosaik463     .         
mosaik464     .         
mosaik465     .         
mosaik466     .         
mosaik467     .         
mosaik468     .         
mosaik469     .         
mosaik470     .         
mosaik471     .         
mosaik472     .         
mosaik473     .         
mosaik474     .         
mosaik475     .         
mosaik476     .         
mosaik477     .         
mosaik478     .         
mosaik479     .         
mosaik480     .         
mosaik481     .         
mosaik482     .         
mosaik483     .         
mosaik484     .         
mosaik485     .         
mosaik486     .         
mosaik487     .         
mosaik488     .         
mosaik489     .         
mosaik490     .         
mosaik491     .         
mosaik492     .         
mosaik493     .         
mosaik494     .         
mosaik495     .         
mosaik496     .         
mosaik497     .         
mosaik498     .         
mosaik499     .         
mosaik500     .         

We want to get the non-zero variables

Code
coefs <- coef(lasso, s = "lambda.min")
nonzerocoefs <- row.names(coefs)[which(coefs!=0)]
nonzerocoefs
[1] "(Intercept)" "pop"         "ntl"         "mosaik263"   "mosaik277"  
[6] "mosaik280"   "mosaik293"   "mosaik459"  
Code
# but we don't want the intercept here (you'll see why)
nonzerocoefs <- nonzerocoefs[-1]
nonzerocoefs
[1] "pop"       "ntl"       "mosaik263" "mosaik277" "mosaik280" "mosaik293"
[7] "mosaik459"

Now we want to turn that into a formula!

Code
formula <- as.formula(paste("poor ~", paste(nonzerocoefs, collapse = " + ")))
formula
poor ~ pop + ntl + mosaik263 + mosaik277 + mosaik280 + mosaik293 + 
    mosaik459
Code
# you can see it works with a simple linear regression!
lm(formula, data = surveyeas)

Call:
lm(formula = formula, data = surveyeas)

Coefficients:
(Intercept)          pop          ntl    mosaik263    mosaik277    mosaik280  
    0.35065     -0.01400     -0.04580      0.22365     -0.04598    -21.35272  
  mosaik293    mosaik459  
   -0.09561     92.93638  

Finally, we can do SAE

Code
library(povmap)

saeresults <- povmap::ebp(fixed = formula, # the formula
  pop_data = adm4,
  pop_domains = "TA_CODE",
  smp_data = surveyeas,
  smp_domains = "TA_CODE",
  transformation = "arcsin",
  weights = "total_weights",
  pop_weights = "pop",
  weights_type = "nlme",
  MSE = TRUE,
  rescale_weights = TRUE,
  seed = 1234,
  L = 0)
Time difference of 0.61 secs

The results

Code
# What is R2?
summary(saeresults)$coeff_determ
 Marginal_R2 Conditional_R2 Marginal_Area_R2 Conditional_Area_R2
   0.3135187      0.4101433        0.5493752           0.7391761
Code
# Relatively normal?
summary(saeresults)$normality
                Skewness Kurtosis Shapiro_W    Shapiro_p
Error         -0.4285633 3.271975 0.9844836 1.529020e-01
Random_effect -0.5277791 6.586239 0.9033095 2.636191e-05

Now let’s get the predictions

Code
# get the predictions
hat <- as_tibble(saeresults$ind)
hat <- hat |>
  select(TA_CODE = Domain, poor = Mean)
# We also want variance estimates
mse <- as_tibble(saeresults$MSE) |>
  select(TA_CODE = Domain, mse = Mean)
# note to get the standard error we have to take the square root!
mse <- mse |>
  mutate(se = sqrt(mse))

# finally, join them!
final <- left_join(hat, mse, by = "TA_CODE")

head(final)
# A tibble: 6 × 4
  TA_CODE  poor     mse     se
  <fct>   <dbl>   <dbl>  <dbl>
1 10101   0.366 0.00211 0.0460
2 10102   0.351 0.00443 0.0665
3 10103   0.263 0.00362 0.0602
4 10104   0.273 0.0106  0.103 
5 10105   0.377 0.00444 0.0666
6 10106   0.383 0.00704 0.0839

Let’s map it!

Code
# Admin3 (TA)
adm3 <- read_sf("rastersdata/mw3.shp")
adm3 <- adm3 |>
  left_join(final, by = "TA_CODE")

# let's output two things: poverty rate and CV (se/mean)
g1 <- ggplot() +
  geom_sf(data = adm3, aes(fill = poor), color = "white") +
  scale_fill_distiller("Poverty Rate", palette = "Spectral", direction = -1) +
  theme_bw() +
  theme(legend.position = "bottom")
g2 <- ggplot() +
  geom_sf(data = adm3, aes(fill = se/poor), color = "white") +
  scale_fill_distiller("CV", palette = "Spectral", direction = -1) +
  theme_bw() +
  theme(legend.position = "bottom")
g3 <- ggplot() +
  geom_sf(data = adm3, aes(fill = as.factor((se/poor)<=0.4)), color = "white") +
  scale_fill_brewer("CV below 0.4?", palette = "Reds", direction = -1) +
  theme_bw() +
  theme(legend.position = "bottom")

The final result!

Code
library(cowplot)

plot_grid(g1, g2, g3, ncol = 3)

And here it is with Rumphi

Code
rumphibbox <- st_bbox(adm3 |> filter(DIST_CODE=="107"))
g2new <- ggdraw() +
  draw_plot(
    {
      g1 +
        coord_sf(
          xlim = c(rumphibbox[1], rumphibbox[3]),
          ylim = c(rumphibbox[2], rumphibbox[4]),
          expand = FALSE) +
        labs(subtitle = "Rumphi only")
    }
  )
g1new <- g1 + 
  labs(subtitle = "Northern Malawi") +
  geom_sf(data = st_as_sf(st_as_sfc(rumphibbox)), fill = NA, color = "black")
plot_grid(g1new, g2new, ncol = 2)